DCIR05 Keerti Chalasani 12/7/19
I performed an analysis of the Chicago Food Inspection data set. I specifically wanted to look at schools to try and understand the condition of the schools that passed the inspection. I did this by mapping the risk level of schools and then performing sentiment analysis on the dataset.
The Chicago food inspection data set includes data from the year 2010 to 2019. Some variables included in this dataset are the inspection id, dba name, aka name, license #, facility type, risk, zip, inspection date, inspection type, results, violations, latitude, and longitude. In this analysis, I’ll be focusing on the inspection date, violations, risk, and facility type. The dataset has 187787 rows and 13 columns. Looking at the data set I realized that there were a lot of schools represented. I was curious to see how healthy schools were considering it’s very important for areas that hold a lot of children to be clean. I decided to see if schools that passed the inspection were still a risk to the public. Do schools that pass the inspection have a lower risk to the public? Are the sentiment values of the violation text positive? Do the sentiment values change over time?
rm(list=ls())
library(tidyverse)
## ── Attaching packages ────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✔ ggplot2 3.2.1 ✔ purrr 0.3.3
## ✔ tibble 2.1.3 ✔ dplyr 0.8.3
## ✔ tidyr 1.0.0 ✔ stringr 1.4.0
## ✔ readr 1.3.1 ✔ forcats 0.4.0
## ── Conflicts ───────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(SentimentAnalysis)
##
## Attaching package: 'SentimentAnalysis'
## The following object is masked from 'package:base':
##
## write
library(sentimentr)
library(dplyr)
library(ggplot2)
library(ggmap)
## Google's Terms of Service: https://cloud.google.com/maps-platform/terms/.
## Please cite ggmap if you use it! See citation("ggmap") for details.
library(RColorBrewer)
library(patchwork)
library(here)
## here() starts at /Users/kc/Desktop
food <- read_csv("https://uofi.box.com/shared/static/5637axblfhajotail80yw7j2s4r27hxd.csv",
col_types = cols(Address = col_skip(),
`Census Tracts` = col_skip(), City = col_skip(),
`Community Areas` = col_skip(), `Historical Wards 2003-2015` = col_skip(),
`Inspection Date` = col_date(format = "%m/%d/%Y"),
Location = col_skip(), State = col_skip(),
Wards = col_skip(), `Zip Codes` = col_skip()))
dim(food)
## [1] 187787 13
colnames(food) <- tolower(colnames(food))
head(food)
## # A tibble: 6 x 13
## `inspection id` `dba name` `aka name` `license #` `facility type` risk zip
## <dbl> <chr> <chr> <dbl> <chr> <chr> <dbl>
## 1 2290733 POLLERIA … POLLERIA … 2428612 Poultry Slaugh… Risk… 60608
## 2 2290799 LA FAMILI… LA FAMILI… 2665437 Grocery Store Risk… 60629
## 3 2290743 RED COACH… RED COACH… 45131 Restaurant Risk… 60629
## 4 2290780 UVA KITCH… UVA KITCH… 2647067 Restaurant Risk… 60640
## 5 2290770 TACO BELL TACO BELL 2670614 Restaurant Risk… 60624
## 6 2290739 ARCHIES ARCHIES 2636959 Restaurant Risk… 60626
## # … with 6 more variables: `inspection date` <date>, `inspection type` <chr>,
## # results <chr>, violations <chr>, latitude <dbl>, longitude <dbl>
chi_bb <- c(left = -87.936287,
bottom = 41.679835,
right = -87.447052,
top = 42.000835)
chicago <- get_stamenmap(bbox = chi_bb,
zoom = 12)
## 42 tiles needed, this may take a while (try a smaller zoom).
## Source : http://tile.stamen.com/terrain/12/1047/1520.png
## Source : http://tile.stamen.com/terrain/12/1048/1520.png
## Source : http://tile.stamen.com/terrain/12/1049/1520.png
## Source : http://tile.stamen.com/terrain/12/1050/1520.png
## Source : http://tile.stamen.com/terrain/12/1051/1520.png
## Source : http://tile.stamen.com/terrain/12/1052/1520.png
## Source : http://tile.stamen.com/terrain/12/1053/1520.png
## Source : http://tile.stamen.com/terrain/12/1047/1521.png
## Source : http://tile.stamen.com/terrain/12/1048/1521.png
## Source : http://tile.stamen.com/terrain/12/1049/1521.png
## Source : http://tile.stamen.com/terrain/12/1050/1521.png
## Source : http://tile.stamen.com/terrain/12/1051/1521.png
## Source : http://tile.stamen.com/terrain/12/1052/1521.png
## Source : http://tile.stamen.com/terrain/12/1053/1521.png
## Source : http://tile.stamen.com/terrain/12/1047/1522.png
## Source : http://tile.stamen.com/terrain/12/1048/1522.png
## Source : http://tile.stamen.com/terrain/12/1049/1522.png
## Source : http://tile.stamen.com/terrain/12/1050/1522.png
## Source : http://tile.stamen.com/terrain/12/1051/1522.png
## Source : http://tile.stamen.com/terrain/12/1052/1522.png
## Source : http://tile.stamen.com/terrain/12/1053/1522.png
## Source : http://tile.stamen.com/terrain/12/1047/1523.png
## Source : http://tile.stamen.com/terrain/12/1048/1523.png
## Source : http://tile.stamen.com/terrain/12/1049/1523.png
## Source : http://tile.stamen.com/terrain/12/1050/1523.png
## Source : http://tile.stamen.com/terrain/12/1051/1523.png
## Source : http://tile.stamen.com/terrain/12/1052/1523.png
## Source : http://tile.stamen.com/terrain/12/1053/1523.png
## Source : http://tile.stamen.com/terrain/12/1047/1524.png
## Source : http://tile.stamen.com/terrain/12/1048/1524.png
## Source : http://tile.stamen.com/terrain/12/1049/1524.png
## Source : http://tile.stamen.com/terrain/12/1050/1524.png
## Source : http://tile.stamen.com/terrain/12/1051/1524.png
## Source : http://tile.stamen.com/terrain/12/1052/1524.png
## Source : http://tile.stamen.com/terrain/12/1053/1524.png
## Source : http://tile.stamen.com/terrain/12/1047/1525.png
## Source : http://tile.stamen.com/terrain/12/1048/1525.png
## Source : http://tile.stamen.com/terrain/12/1049/1525.png
## Source : http://tile.stamen.com/terrain/12/1050/1525.png
## Source : http://tile.stamen.com/terrain/12/1051/1525.png
## Source : http://tile.stamen.com/terrain/12/1052/1525.png
## Source : http://tile.stamen.com/terrain/12/1053/1525.png
schools <- food %>% filter(food$`facility type` == "School", food$results =="Pass", risk != "All", risk != "")
ggmap(chicago) + geom_point(data = schools, aes(longitude, latitude, color = factor(risk)), size = .55) + labs(title = "Risk Level for Schools", x = "Longitude", y = "Latitude")
## Warning: Removed 570 rows containing missing values (geom_point).
## Analysis:
This image shows the schools that passed the inspection labeled by risk. In this image, there are many orange dots all around Chicago meaning that although the school’s passed the inspection they are still considered to have a high health risk for the general population. This is shocking to see because I would assume that if the schools had passed the inspection they would be considered healthy and safe, instead, they are still posing a health risk.
school_pass = subset(food, food$results == "Pass",food$`facility type`=="School", select=c(violations,risk,`inspection date`,longitude, latitude))
## Warning in if (drop) {: the condition has length > 1 and only the first element
## will be used
school_pass = drop_na(school_pass)
mysample <- school_pass[sample(1:nrow(school_pass), 700),]
calculate_sentiment <- function(x){
violations_sentiments <- sentiment(x)
return( mean(violations_sentiments[violations_sentiments$word_count > 5]$sentiment))
}
sentiment_school = lapply(mysample$violations, calculate_sentiment)
sentiment_vector = unlist(sentiment_school)
#is.vector(sentiment_vector)
mysample$sentiment = sentiment_vector
I then calculated the sentiment values for each violation text for schools that passed. I was only able to run this on a sample of 700 since my computer could not handle more data than that and my R studio kept crashing.
summary(mysample$sentiment)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -0.41079 0.08889 0.17689 0.18050 0.27003 0.78262
I then did a summary of the sentiment values and found that the median sentiment is .1818 which shows that overall most of the violation text sounds positive but the value is still very low. Possibly meaning that although it sounds positive it still does not sound great.
ggmap(chicago) + geom_point(data = mysample, aes(longitude, latitude, color = sentiment), size = .9, ) + labs(title = "Sentiment Values for Schools", x = "Longitude", y = "Latitude") + scale_colour_gradient(low="coral", high="steelblue")
## Warning: Removed 36 rows containing missing values (geom_point).
I then mapped the sentiment values to visually see if there were more positive values than negative values. According to this graph, most of the sentiments are in a more orange hue than a blue hue. this indicates that the violation text for schools that passed still sounds negative.
ggplot(data = mysample, aes( x = `inspection date` , y = sentiment)) +
geom_line(color = "steelblue") + ggtitle("Sentiment Values Over Time")
Out of curiosity, I wanted to see if the sentiment values for the school’s changed over time. I plotted the sentiments over the inspection dates from 2010 to 2019 and found that the sentiments do not really change over time.
In conclusion, the analysis showed that although schools pass the health inspection they are still considered risky and unsafe. By mapping the schools in Chicago based on risk value it showed that most of the schools were considered high risk while very few of them were considered low and medium risk. After performing sentiment analysis on the violation text of schools that passed I found that even though the school passed the inspection the average sentiment value was still very low at .1745. This shows that although a school passed the inspection the person conducting the inspection did not have that many positive comments to be said about the school. After mapping the sentiments over the area of Chicago it is shown that most of the sentiments fall under an orange color meaning they are closer to zero or negative. The analysis of the sentiment values over time do not show a pattern. The values are extremely volatile and do not really change over time. I believe that this means that schools are still not that healthy for children to be in. It might be time to find different inspections to ensure that schools that pass inspections are still healthy.
This dataset contains the profile data for OkCupid users in the city of San Francisco. The dataset consists of 59,946 records. The dataset has many different variables including 10 essay questions which is what I will be focusing on in this report.
I feel like a common understanding/misconception when it comes to dating is that women tend to look for more love, affection, and commitment than men. Today I will be focusing on the OkCupid data for all the users in the city of San Francisco and comparing essay responses between men and women. I am curious to see if women sound more positive in their essays than men and if women use more words like ‘love’ and ‘commitment’. I will be focusing specifically on essay0 for both men and women from all age groups. I will be performing sentiment analysis on the essays and then creating a couple of plots of the most frequent words used.
Some of the questions I will be asking include : Is there a more positive sentiment for the essay 0 questions based on the sex of the user? Do women say the word ‘love’ more in their essays?
oc <- read_csv("https://uofi.box.com/shared/static/oy32nc373w4jqz3kummksnw6wvhfrl7a.csv",
col_types = cols(last_online = col_datetime(format = "%Y-%m-%d-%H-%M")))
head(oc)
## # A tibble: 6 x 31
## age body_type diet drinks drugs education essay0 essay1 essay2 essay3
## <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 22 a little… stri… socia… never working … "abou… "curr… "maki… "the …
## 2 35 average most… often some… working … "i am… dedic… "bein… <NA>
## 3 38 thin anyt… socia… <NA> graduate… "i'm … "i ma… "impr… "my l…
## 4 23 thin vege… socia… <NA> working … i wor… readi… "play… socia…
## 5 29 athletic <NA> socia… never graduate… "hey … work … "crea… i smi…
## 6 29 average most… socia… <NA> graduate… "i'm … "buil… "imag… "i ha…
## # … with 21 more variables: essay4 <chr>, essay5 <chr>, essay6 <chr>,
## # essay7 <chr>, essay8 <chr>, essay9 <chr>, ethnicity <chr>, height <dbl>,
## # income <dbl>, job <chr>, last_online <dttm>, location <chr>,
## # offspring <chr>, orientation <chr>, pets <chr>, religion <chr>, sex <chr>,
## # sign <chr>, smokes <chr>, speaks <chr>, status <chr>
colnames(oc) <- tolower(colnames(oc))
women = subset(oc, oc$sex == "f", select=c(sex, essay0))
women = drop_na(women)
mysample <- women[sample(1:nrow(women), 500),]
calculate_sentiment <- function(x){
violations_sentiments <- sentiment(x)
return( mean(violations_sentiments[violations_sentiments$word_count > 5]$sentiment))
}
women_sentiment0 = lapply(mysample$essay0, calculate_sentiment)
women_sent = unlist(women_sentiment0)
summary(women_sent)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## -0.4773 0.1447 0.2650 0.2919 0.4159 1.7133 11
The average sentiment for women in essay one is .2857.
set.seed(2019)
si <- sample(1:nrow(women),20) #random sample of 20 rows
library(tm)
## Loading required package: NLP
##
## Attaching package: 'NLP'
## The following object is masked from 'package:ggplot2':
##
## annotate
e8 <- data.frame(doc_id=si,text=women$essay0[si],stringsAsFactors = FALSE)
corpus <- VCorpus(DataframeSource(e8))
tryTolower <- function(x){
y = NA
try_error = tryCatch(tolower(x), error = function(e) e)
if (!inherits(try_error, 'error'))
y = tolower(x)
return(y)
}
clean.corpus<-function(corpus){
corpus <- tm_map(corpus, content_transformer(tryTolower))
corpus <- tm_map(corpus, removeWords, stopwords('english'))
corpus <- tm_map(corpus, removePunctuation)
corpus <- tm_map(corpus, stripWhitespace)
corpus <- tm_map(corpus, removeNumbers)
return(corpus)
}
newcorpus <- clean.corpus(corpus)
tdm<-TermDocumentMatrix(newcorpus, control=list(weighting=weightTf))
tdm.essay0 <- as.matrix(tdm)
sfq <- data.frame(words=names(sort(rowSums(tdm.essay0),decreasing = TRUE)), freqs=sort(rowSums(tdm.essay0),decreasing = TRUE), row.names = NULL)
sfq
## words freqs
## 1 love 25
## 2 someone 18
## 3 like 17
## 4 life 16
## 5 want 14
## 6 time 13
## 7 looking 12
## 8 play 12
## 9 new 10
## 10 can 9
## 11 family 9
## 12 friends 9
## 13 just 9
## 14 know 9
## 15 meet 9
## 16 people 9
## 17 pretty 9
## 18 really 9
## 19 years 9
## 20 classilink 8
## 21 enjoy 7
## 22 fun 7
## 23 right 7
## 24 things 7
## 25 nice 6
## 26 open 6
## 27 person 6
## 28 still 6
## 29 sure 6
## 30 food 5
## 31 going 5
## 32 hard 5
## 33 href 5
## 34 now 5
## 35 one 5
## 36 way 5
## 37 work 5
## 38 world 5
## 39 year 5
## 40 adventures 4
## 41 almost 4
## 42 also 4
## 43 anything 4
## 44 area 4
## 45 bay 4
## 46 big 4
## 47 day 4
## 48 exploring 4
## 49 feeling 4
## 50 find 4
## 51 first 4
## 52 games 4
## 53 good 4
## 54 happy 4
## 55 heart 4
## 56 make 4
## 57 rather 4
## 58 relationship 4
## 59 since 4
## 60 working 4
## 61 although 3
## 62 around 3
## 63 away 3
## 64 back 3
## 65 best 3
## 66 born 3
## 67 comes 3
## 68 communication 3
## 69 connections 3
## 70 even 3
## 71 ever 3
## 72 feels 3
## 73 feet 3
## 74 great 3
## 75 helping 3
## 76 home 3
## 77 important 3
## 78 interested 3
## 79 involving 3
## 80 kind 3
## 81 laugh 3
## 82 learned 3
## 83 learning 3
## 84 lot 3
## 85 loving 3
## 86 making 3
## 87 married 3
## 88 much 3
## 89 music 3
## 90 old 3
## 91 others 3
## 92 passionate 3
## 93 past 3
## 94 personal 3
## 95 raised 3
## 96 san 3
## 97 sarcastic 3
## 98 say 3
## 99 self 3
## 100 spent 3
## 101 usually 3
## 102 will 3
## 103 ago 2
## 104 alone 2
## 105 always 2
## 106 amp 2
## 107 avid 2
## 108 awesome 2
## 109 awkward 2
## 110 bit 2
## 111 butch 2
## 112 came 2
## 113 canadian 2
## 114 chicago 2
## 115 class 2
## 116 coast 2
## 117 come 2
## 118 comfortable 2
## 119 confident 2
## 120 connected 2
## 121 consider 2
## 122 control 2
## 123 currently 2
## 124 dates 2
## 125 dating 2
## 126 drums 2
## 127 earlier 2
## 128 earth 2
## 129 east 2
## 130 elements 2
## 131 especially 2
## 132 ethic 2
## 133 experiences 2
## 134 explore 2
## 135 extremely 2
## 136 finding 2
## 137 five 2
## 138 forward 2
## 139 francisco 2
## 140 free 2
## 141 friend 2
## 142 general 2
## 143 generous 2
## 144 getting 2
## 145 girl 2
## 146 girlfriend 2
## 147 got 2
## 148 grad 2
## 149 hello 2
## 150 hiking 2
## 151 hope 2
## 152 hopefully 2
## 153 humor 2
## 154 identity 2
## 155 ilink 2
## 156 independent 2
## 157 instead 2
## 158 interests 2
## 159 job 2
## 160 journey 2
## 161 knowing 2
## 162 laid 2
## 163 last 2
## 164 let 2
## 165 live 2
## 166 lived 2
## 167 long 2
## 168 looks 2
## 169 low 2
## 170 makes 2
## 171 many 2
## 172 may 2
## 173 midwest 2
## 174 name 2
## 175 never 2
## 176 occupy 2
## 177 outside 2
## 178 overall 2
## 179 park 2
## 180 partner 2
## 181 passion 2
## 182 place 2
## 183 poly 2
## 184 proud 2
## 185 put 2
## 186 queer 2
## 187 real 2
## 188 relationships 2
## 189 restaurant 2
## 190 rock 2
## 191 school 2
## 192 schoolbr 2
## 193 sense 2
## 194 sex 2
## 195 sexual 2
## 196 silly 2
## 197 small 2
## 198 smart 2
## 199 something 2
## 200 sometimes 2
## 201 special 2
## 202 spending 2
## 203 state 2
## 204 stuff 2
## 205 sweet 2
## 206 take 2
## 207 taken 2
## 208 taking 2
## 209 talking 2
## 210 tattoos 2
## 211 think 2
## 212 trees 2
## 213 value 2
## 214 video 2
## 215 walk 2
## 216 water 2
## 217 well 2
## 218 witty 2
## 219 woman 2
## 220 yet 2
## 221 able 1
## 222 abroad 1
## 223 abundance 1
## 224 accessorize 1
## 225 across 1
## 226 activities 1
## 227 actually 1
## 228 add 1
## 229 adrenaline 1
## 230 adventure 1
## 231 adventuresbr 1
## 232 agressive 1
## 233 air 1
## 234 aligned 1
## 235 allllll 1
## 236 amazing 1
## 237 america 1
## 238 american 1
## 239 among 1
## 240 amuse 1
## 241 andrea 1
## 242 anybody 1
## 243 apparently 1
## 244 appreciate 1
## 245 appreciative 1
## 246 arbitrary 1
## 247 argentinean 1
## 248 arm 1
## 249 arts 1
## 250 asidesbr 1
## 251 atrocious 1
## 252 attention 1
## 253 august 1
## 254 babies 1
## 255 bag 1
## 256 balance 1
## 257 band 1
## 258 banter 1
## 259 bar 1
## 260 bart 1
## 261 bawdy 1
## 262 beachbr 1
## 263 beautiful 1
## 264 begin 1
## 265 believe 1
## 266 better 1
## 267 bicyclist 1
## 268 bike 1
## 269 bills 1
## 270 bittersweet 1
## 271 bizarrebr 1
## 272 block 1
## 273 blond 1
## 274 board 1
## 275 body 1
## 276 bother 1
## 277 bound 1
## 278 bowling 1
## 279 boyfriend 1
## 280 brain 1
## 281 breaking 1
## 282 breathe 1
## 283 bright 1
## 284 broad 1
## 285 broadcasting 1
## 286 brooklyn 1
## 287 brunch 1
## 288 building 1
## 289 business 1
## 290 california 1
## 291 camera 1
## 292 camper 1
## 293 camping 1
## 294 car 1
## 295 care 1
## 296 career 1
## 297 caring 1
## 298 carving 1
## 299 casual 1
## 300 certain 1
## 301 challenge 1
## 302 changes 1
## 303 charmer 1
## 304 chase 1
## 305 chat 1
## 306 check 1
## 307 chemistry 1
## 308 child 1
## 309 childhood 1
## 310 choicebr 1
## 311 choose 1
## 312 circle 1
## 313 city 1
## 314 citybr 1
## 315 clever 1
## 316 closely 1
## 317 closer 1
## 318 clothes 1
## 319 clothing 1
## 320 clothingbr 1
## 321 coaster 1
## 322 coffee 1
## 323 college 1
## 324 collegebr 1
## 325 comedy 1
## 326 commenting 1
## 327 commitment 1
## 328 committed 1
## 329 communicate 1
## 330 company 1
## 331 composting 1
## 332 computer 1
## 333 configuration 1
## 334 connecting 1
## 335 connection 1
## 336 considerate 1
## 337 constantly 1
## 338 conversation 1
## 339 cooking 1
## 340 cool 1
## 341 coparents 1
## 342 couple 1
## 343 course 1
## 344 courtship 1
## 345 coworkers 1
## 346 creating 1
## 347 creative 1
## 348 crucial 1
## 349 culture 1
## 350 curls 1
## 351 curvy 1
## 352 custody 1
## 353 cut 1
## 354 dadbr 1
## 355 dancing 1
## 356 dark 1
## 357 dashing 1
## 358 dayday 1
## 359 days 1
## 360 dealbreakers 1
## 361 dear 1
## 362 deeply 1
## 363 definitely 1
## 364 definitions 1
## 365 describe 1
## 366 design 1
## 367 designing 1
## 368 details 1
## 369 developments 1
## 370 different 1
## 371 disney 1
## 372 dominant 1
## 373 dorky 1
## 374 drug 1
## 375 dry 1
## 376 dyke 1
## 377 early 1
## 378 earned 1
## 379 ears 1
## 380 easy 1
## 381 eating 1
## 382 electronic 1
## 383 else 1
## 384 emotional 1
## 385 endeavoring 1
## 386 energizes 1
## 387 energy 1
## 388 enjoyed 1
## 389 enjoyment 1
## 390 enough 1
## 391 era 1
## 392 estsy 1
## 393 etc 1
## 394 everbr 1
## 395 everyday 1
## 396 everyone 1
## 397 everything 1
## 398 everywhere 1
## 399 exciting 1
## 400 exeast 1
## 401 exemplary 1
## 402 exhusband 1
## 403 exnew 1
## 404 expectations 1
## 405 experience 1
## 406 explorer 1
## 407 extent 1
## 408 extrovert 1
## 409 extroverted 1
## 410 factsbr 1
## 411 fall 1
## 412 fast 1
## 413 father 1
## 414 favorite 1
## 415 feel 1
## 416 felt 1
## 417 femme 1
## 418 field 1
## 419 fifteen 1
## 420 figure 1
## 421 filled 1
## 422 finally 1
## 423 firstbr 1
## 424 fit 1
## 425 fixation 1
## 426 flaunt 1
## 427 fluid 1
## 428 fluidity 1
## 429 foodie 1
## 430 foreign 1
## 431 foremost 1
## 432 fosteringbr 1
## 433 france 1
## 434 franciscobr 1
## 435 french 1
## 436 fresh 1
## 437 friendly 1
## 438 fringe 1
## 439 frustrated 1
## 440 fullybaked 1
## 441 funny 1
## 442 furious 1
## 443 furniture 1
## 444 gamebr 1
## 445 garden 1
## 446 gas 1
## 447 geekbr 1
## 448 genderqueer 1
## 449 genghis 1
## 450 genres 1
## 451 get 1
## 452 gets 1
## 453 ghost 1
## 454 girlbr 1
## 455 gone 1
## 456 goodbyes 1
## 457 graduate 1
## 458 grammar 1
## 459 grew 1
## 460 growing 1
## 461 guaranteed 1
## 462 guess 1
## 463 guys 1
## 464 hands 1
## 465 happiness 1
## 466 hardware 1
## 467 hardworking 1
## 468 harry 1
## 469 hate 1
## 470 healthy 1
## 471 hearted 1
## 472 hell 1
## 473 hey 1
## 474 hide 1
## 475 hikes 1
## 476 hippies 1
## 477 hips 1
## 478 hold 1
## 479 honest 1
## 480 honestly 1
## 481 hoodies 1
## 482 hot 1
## 483 hours 1
## 484 however 1
## 485 hrefinterestsbadjokesbad 1
## 486 hrefinterestsdesigndesign 1
## 487 hrefinterestseameseames 1
## 488 hrefinterestsguitarguitar 1
## 489 hrefinterestsviolinviolin 1
## 490 hugs 1
## 491 humble 1
## 492 huuuge 1
## 493 hypergendered 1
## 494 idea 1
## 495 idealsbr 1
## 496 impact 1
## 497 imperfect 1
## 498 income 1
## 499 incredibly 1
## 500 indulgence 1
## 501 indulgences 1
## 502 inquisitive 1
## 503 intentional 1
## 504 interacting 1
## 505 interesting 1
## 506 interestsarchitecturearchitecture 1
## 507 interestsbikesbikes 1
## 508 interestscampingcamping 1
## 509 interestsmuirmuir 1
## 510 interestsvintageclothesvintage 1
## 511 intuitive 1
## 512 invention 1
## 513 involved 1
## 514 irritating 1
## 515 jobs 1
## 516 joke 1
## 517 jokes 1
## 518 joy 1
## 519 judged 1
## 520 junkybr 1
## 521 khunt 1
## 522 kids 1
## 523 kindness 1
## 524 laidback 1
## 525 language 1
## 526 latest 1
## 527 laughing 1
## 528 laughter 1
## 529 law 1
## 530 leaps 1
## 531 learn 1
## 532 left 1
## 533 lifebr 1
## 534 lifestyle 1
## 535 lifetime 1
## 536 likely 1
## 537 likes 1
## 538 little 1
## 539 lives 1
## 540 living 1
## 541 locally 1
## 542 london 1
## 543 lovers 1
## 544 loves 1
## 545 loyal 1
## 546 made 1
## 547 maintenance 1
## 548 man 1
## 549 marriage 1
## 550 masstransit 1
## 551 matters 1
## 552 maybe 1
## 553 means 1
## 554 medical 1
## 555 meeting 1
## 556 mellow 1
## 557 mentioned 1
## 558 mexico 1
## 559 mids 1
## 560 migrated 1
## 561 milf 1
## 562 mills 1
## 563 mindful 1
## 564 money 1
## 565 moved 1
## 566 movie 1
## 567 movies 1
## 568 moviesbr 1
## 569 naturelove 1
## 570 naviating 1
## 571 nearly 1
## 572 neatly 1
## 573 need 1
## 574 news 1
## 575 nights 1
## 576 nothing 1
## 577 notice 1
## 578 numerous 1
## 579 nyc 1
## 580 nymag 1
## 581 obligebr 1
## 582 obscure 1
## 583 obsessed 1
## 584 obsessing 1
## 585 older 1
## 586 online 1
## 587 opportunities 1
## 588 opportunity 1
## 589 option 1
## 590 oriented 1
## 591 originally 1
## 592 otherwise 1
## 593 outdoor 1
## 594 outdoors 1
## 595 outdoorsbr 1
## 596 outlook 1
## 597 outsidebr 1
## 598 overly 1
## 599 paramount 1
## 600 parenthetical 1
## 601 particularly 1
## 602 partners 1
## 603 parts 1
## 604 partying 1
## 605 pass 1
## 606 passions 1
## 607 pathbr 1
## 608 pay 1
## 609 peoplebr 1
## 610 performance 1
## 611 perhaps 1
## 612 persons 1
## 613 perspective 1
## 614 phone 1
## 615 physically 1
## 616 pic 1
## 617 picnics 1
## 618 picture 1
## 619 places 1
## 620 playbr 1
## 621 playing 1
## 622 please 1
## 623 pleasure 1
## 624 point 1
## 625 pointing 1
## 626 polyamorous 1
## 627 polybr 1
## 628 polysaturated 1
## 629 posh 1
## 630 positive 1
## 631 possible 1
## 632 posted 1
## 633 potter 1
## 634 power 1
## 635 prefer 1
## 636 problem 1
## 637 produce 1
## 638 profile 1
## 639 program 1
## 640 prove 1
## 641 proverbial 1
## 642 provided 1
## 643 pushing 1
## 644 queers 1
## 645 quests 1
## 646 quickly 1
## 647 quite 1
## 648 quote 1
## 649 racing 1
## 650 radical 1
## 651 random 1
## 652 randomass 1
## 653 read 1
## 654 reader 1
## 655 reading 1
## 656 realizing 1
## 657 realm 1
## 658 reason 1
## 659 reasonably 1
## 660 recent 1
## 661 reforming 1
## 662 regimented 1
## 663 relating 1
## 664 relaxed 1
## 665 relocated 1
## 666 remain 1
## 667 repartee 1
## 668 respect 1
## 669 revamp 1
## 670 ride 1
## 671 rightsbr 1
## 672 rolly 1
## 673 romance 1
## 674 romantic 1
## 675 roots 1
## 676 rowbr 1
## 677 rowing 1
## 678 rules 1
## 679 run 1
## 680 running 1
## 681 rushing 1
## 682 said 1
## 683 sales 1
## 684 sarcasticbr 1
## 685 sassy 1
## 686 save 1
## 687 savvy 1
## 688 says 1
## 689 scarcitydriven 1
## 690 see 1
## 691 seeing 1
## 692 seeking 1
## 693 seeks 1
## 694 selfexploration 1
## 695 selfreliant 1
## 696 sell 1
## 697 selling 1
## 698 send 1
## 699 september 1
## 700 service 1
## 701 sexpositivity 1
## 702 sexually 1
## 703 sexy 1
## 704 sfsu 1
## 705 shaped 1
## 706 share 1
## 707 sharing 1
## 708 shifts 1
## 709 shit 1
## 710 shows 1
## 711 shy 1
## 712 silence 1
## 713 simple 1
## 714 site 1
## 715 sky 1
## 716 sleazy 1
## 717 sly 1
## 718 smile 1
## 719 snobby 1
## 720 solid 1
## 721 south 1
## 722 souzabee 1
## 723 space 1
## 724 spain 1
## 725 spar 1
## 726 spelling 1
## 727 spiritually 1
## 728 spontaneity 1
## 729 spontaneous 1
## 730 spots 1
## 731 states 1
## 732 strong 1
## 733 student 1
## 734 studying 1
## 735 stumble 1
## 736 suits 1
## 737 sunlight 1
## 738 sweep 1
## 739 swim 1
## 740 swimmingbr 1
## 741 switchy 1
## 742 table 1
## 743 talk 1
## 744 tall 1
## 745 tastings 1
## 746 taught 1
## 747 teacher 1
## 748 teeth 1
## 749 tell 1
## 750 temporary 1
## 751 ten 1
## 752 terms 1
## 753 tests 1
## 754 textbr 1
## 755 theses 1
## 756 thing 1
## 757 though 1
## 758 thoughbr 1
## 759 timebr 1
## 760 times 1
## 761 tired 1
## 762 togetherand 1
## 763 tomorrow 1
## 764 touch 1
## 765 tourist 1
## 766 town 1
## 767 transported 1
## 768 travel 1
## 769 traveled 1
## 770 tropical 1
## 771 trucks 1
## 772 trusting 1
## 773 truthfully 1
## 774 try 1
## 775 turn 1
## 776 two 1
## 777 types 1
## 778 undercover 1
## 779 understand 1
## 780 understanding 1
## 781 upbeat 1
## 782 urban 1
## 783 use 1
## 784 used 1
## 785 variety 1
## 786 veto 1
## 787 views 1
## 788 vintage 1
## 789 visit 1
## 790 vital 1
## 791 vocabulary 1
## 792 wake 1
## 793 walking 1
## 794 wandering 1
## 795 wanted 1
## 796 wants 1
## 797 warm 1
## 798 warmth 1
## 799 watching 1
## 800 waybr 1
## 801 weather 1
## 802 weekend 1
## 803 welcome 1
## 804 wellbrought 1
## 805 went 1
## 806 west 1
## 807 whatever 1
## 808 whenever 1
## 809 whose 1
## 810 wide 1
## 811 winning 1
## 812 words 1
## 813 worry 1
## 814 wrestling 1
## 815 writer 1
## 816 writing 1
## 817 yep 1
## 818 yes 1
## 819 yorker 1
## 820 youths 1
ggplot(sfq[1:20,], mapping = aes(x = reorder(words, freqs), y = freqs)) +
geom_bar(stat= "identity", fill="#d598a3") +
coord_flip() +
scale_colour_hue() +
labs(x= "Words", title = "20 Most Frequent Words (Essay0 Subset for Women)") +
theme(panel.background = element_blank(), axis.ticks.x = element_blank(),axis.ticks.y = element_blank())
library(wordcloud)
wordcloud(sfq$words,sfq$freqs, min.freq = 1, max.words = 30, colors="#d598a3")
According to the plots and word cloud created it does show that women use the word ‘love’ more along with words like ‘family’, ‘someone’, ‘years’, and ‘want’. I believe this shows that women tend to use more words that sound like they are looking for a commitment.
men = subset(oc, oc$sex == "m", select=c(sex, essay0))
men = drop_na(men)
sm <- men[sample(1:nrow(men), 500),]
calculate_sentiment <- function(x){
violations_sentiments <- sentiment(x)
return( mean(violations_sentiments[violations_sentiments$word_count > 5]$sentiment))
}
men_sentiment0 = lapply(sm$essay0, calculate_sentiment)
men_sent = unlist(men_sentiment0)
summary(men_sent)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## -0.3361 0.1194 0.2299 0.2531 0.3599 1.4199 17
The average sentiment for men in essay one is .2649.
The average sentiment for men and women is basically the same so I believe that it isn’t necessarily true that women sound more positive than men.
set.seed(2019)
sm <- sample(1:nrow(men),20) #random sample of 20 rows
library(tm)
e8 <- data.frame(doc_id=si,text=men$essay0[sm],stringsAsFactors = FALSE)
corpus <- VCorpus(DataframeSource(e8))
tryTolower <- function(x){
y = NA
try_error = tryCatch(tolower(x), error = function(e) e)
if (!inherits(try_error, 'error'))
y = tolower(x)
return(y)
}
clean.corpus<-function(corpus){
corpus <- tm_map(corpus, content_transformer(tryTolower))
corpus <- tm_map(corpus, removeWords, stopwords('english'))
corpus <- tm_map(corpus, removePunctuation)
corpus <- tm_map(corpus, stripWhitespace)
corpus <- tm_map(corpus, removeNumbers)
return(corpus)
}
newcorpus_m <- clean.corpus(corpus)
tdm_men<-TermDocumentMatrix(newcorpus_m, control=list(weighting=weightTf))
tdm_men.essay0 <- as.matrix(tdm_men)
sfq_men <- data.frame(words=names(sort(rowSums(tdm_men.essay0),decreasing = TRUE)), freqs=sort(rowSums(tdm_men.essay0),decreasing = TRUE), row.names = NULL)
sfq_men
## words freqs
## 1 life 15
## 2 things 10
## 3 kind 8
## 4 looking 8
## 5 new 8
## 6 like 7
## 7 one 7
## 8 people 7
## 9 can 6
## 10 good 6
## 11 someone 6
## 12 always 5
## 13 around 5
## 14 friends 5
## 15 guy 5
## 16 meet 5
## 17 now 5
## 18 say 5
## 19 work 5
## 20 find 4
## 21 fun 4
## 22 humor 4
## 23 see 4
## 24 share 4
## 25 time 4
## 26 trying 4
## 27 working 4
## 28 world 4
## 29 anyone 3
## 30 best 3
## 31 bit 3
## 32 enjoy 3
## 33 exploring 3
## 34 friend 3
## 35 going 3
## 36 last 3
## 37 living 3
## 38 love 3
## 39 move 3
## 40 partner 3
## 41 pretty 3
## 42 really 3
## 43 san 3
## 44 six 3
## 45 soon 3
## 46 start 3
## 47 taken 3
## 48 thing 3
## 49 think 3
## 50 try 3
## 51 will 3
## 52 year 3
## 53 also 2
## 54 amet 2
## 55 awkward 2
## 56 back 2
## 57 bad 2
## 58 bay 2
## 59 believe 2
## 60 big 2
## 61 born 2
## 62 business 2
## 63 california 2
## 64 chill 2
## 65 consectetur 2
## 66 currently 2
## 67 dance 2
## 68 dominating 2
## 69 earth 2
## 70 east 2
## 71 easygoing 2
## 72 enjoying 2
## 73 family 2
## 74 francisco 2
## 75 full 2
## 76 great 2
## 77 grew 2
## 78 group 2
## 79 interesting 2
## 80 interests 2
## 81 ipsum 2
## 82 job 2
## 83 just 2
## 84 keep 2
## 85 know 2
## 86 knows 2
## 87 learning 2
## 88 lorem 2
## 89 make 2
## 90 may 2
## 91 meeting 2
## 92 met 2
## 93 minds 2
## 94 moved 2
## 95 much 2
## 96 mushrooms 2
## 97 never 2
## 98 newbr 2
## 99 nice 2
## 100 open 2
## 101 passionate 2
## 102 places 2
## 103 playing 2
## 104 profile 2
## 105 quiet 2
## 106 quis 2
## 107 raised 2
## 108 read 2
## 109 recently 2
## 110 school 2
## 111 sit 2
## 112 somewhere 2
## 113 summarize 2
## 114 sweden 2
## 115 take 2
## 116 technology 2
## 117 though 2
## 118 thoughts 2
## 119 told 2
## 120 type 2
## 121 unique 2
## 122 want 2
## 123 wonderful 2
## 124 writing 2
## 125 years 2
## 126 academy 1
## 127 accelerate 1
## 128 activities 1
## 129 actually 1
## 130 adapting 1
## 131 adipiscing 1
## 132 adventure 1
## 133 adventures 1
## 134 afforded 1
## 135 ago 1
## 136 aliquam 1
## 137 aliquet 1
## 138 along 1
## 139 alto 1
## 140 american 1
## 141 amp 1
## 142 anything 1
## 143 apply 1
## 144 approximately 1
## 145 art 1
## 146 asshole 1
## 147 attempting 1
## 148 attract 1
## 149 australia 1
## 150 average 1
## 151 background 1
## 152 ball 1
## 153 band 1
## 154 beach 1
## 155 become 1
## 156 behavioral 1
## 157 beyond 1
## 158 blessed 1
## 159 board 1
## 160 book 1
## 161 borderline 1
## 162 broad 1
## 163 brought 1
## 164 btw 1
## 165 building 1
## 166 bupdateb 1
## 167 cattle 1
## 168 chance 1
## 169 change 1
## 170 character 1
## 171 charming 1
## 172 chinese 1
## 173 chose 1
## 174 circle 1
## 175 circumstances 1
## 176 cities 1
## 177 city 1
## 178 coast 1
## 179 cocky 1
## 180 coffee 1
## 181 college 1
## 182 comfortable 1
## 183 coming 1
## 184 community 1
## 185 conceited 1
## 186 consequat 1
## 187 considerate 1
## 188 conspicuous 1
## 189 continents 1
## 190 contradictions 1
## 191 conversations 1
## 192 cook 1
## 193 cooking 1
## 194 cool 1
## 195 coordinate 1
## 196 cordinated 1
## 197 corny 1
## 198 count 1
## 199 course 1
## 200 cras 1
## 201 creative 1
## 202 culture 1
## 203 cupid 1
## 204 cycling 1
## 205 cynical 1
## 206 cynicism 1
## 207 dark 1
## 208 dash 1
## 209 date 1
## 210 daysthats 1
## 211 deal 1
## 212 deals 1
## 213 deep 1
## 214 dependable 1
## 215 derivatives 1
## 216 design 1
## 217 developmental 1
## 218 diamond 1
## 219 dictated 1
## 220 diego 1
## 221 difference 1
## 222 different 1
## 223 difficult 1
## 224 dine 1
## 225 direct 1
## 226 disabilities 1
## 227 dives 1
## 228 dolor 1
## 229 done 1
## 230 dont 1
## 231 dork 1
## 232 drawing 1
## 233 dreams 1
## 234 drinkbr 1
## 235 drive 1
## 236 duis 1
## 237 early 1
## 238 earned 1
## 239 easy 1
## 240 economicsbr 1
## 241 educated 1
## 242 egestas 1
## 243 elit 1
## 244 encouraging 1
## 245 end 1
## 246 endeavor 1
## 247 ended 1
## 248 engineer 1
## 249 engineering 1
## 250 enim 1
## 251 enough 1
## 252 est 1
## 253 euismod 1
## 254 even 1
## 255 eventsbr 1
## 256 eventually 1
## 257 ever 1
## 258 excellent 1
## 259 exciting 1
## 260 exclusive 1
## 261 exotic 1
## 262 expect 1
## 263 experience 1
## 264 experiment 1
## 265 express 1
## 266 eyes 1
## 267 facilisis 1
## 268 fairly 1
## 269 fast 1
## 270 fatherstill 1
## 271 favorite 1
## 272 feelings 1
## 273 feels 1
## 274 felis 1
## 275 feugiat 1
## 276 figured 1
## 277 figuring 1
## 278 filled 1
## 279 finally 1
## 280 finished 1
## 281 firmly 1
## 282 first 1
## 283 fit 1
## 284 five 1
## 285 follow 1
## 286 force 1
## 287 forces 1
## 288 forward 1
## 289 found 1
## 290 frequently 1
## 291 friendly 1
## 292 fringilla 1
## 293 game 1
## 294 games 1
## 295 germany 1
## 296 get 1
## 297 getbr 1
## 298 getting 1
## 299 gives 1
## 300 grad 1
## 301 graduate 1
## 302 graduated 1
## 303 grouponesque 1
## 304 growth 1
## 305 guiding 1
## 306 guitar 1
## 307 guys 1
## 308 hang 1
## 309 hanging 1
## 310 happen 1
## 311 happier 1
## 312 happy 1
## 313 hard 1
## 314 hardest 1
## 315 harmonize 1
## 316 hate 1
## 317 hell 1
## 318 high 1
## 319 highly 1
## 320 hill 1
## 321 hit 1
## 322 hive 1
## 323 hold 1
## 324 homework 1
## 325 honesty 1
## 326 hope 1
## 327 hopefully 1
## 328 hoping 1
## 329 house 1
## 330 howdy 1
## 331 hunt 1
## 332 hypocrisy 1
## 333 idea 1
## 334 idealistic 1
## 335 ignore 1
## 336 impossible 1
## 337 injustice 1
## 338 instead 1
## 339 interdum 1
## 340 internets 1
## 341 introspective 1
## 342 ive 1
## 343 jobs 1
## 344 jokes 1
## 345 juuuuuuuuuuust 1
## 346 kewl 1
## 347 key 1
## 348 kindness 1
## 349 lacking 1
## 350 laugh 1
## 351 learn 1
## 352 line 1
## 353 linguistics 1
## 354 list 1
## 355 little 1
## 356 live 1
## 357 lived 1
## 358 loads 1
## 359 located 1
## 360 long 1
## 361 longer 1
## 362 loserdom 1
## 363 lot 1
## 364 loves 1
## 365 luxury 1
## 366 magical 1
## 367 magnetized 1
## 368 magnets 1
## 369 makes 1
## 370 making 1
## 371 man 1
## 372 many 1
## 373 marriage 1
## 374 masculine 1
## 375 master 1
## 376 maybe 1
## 377 mean 1
## 378 mechanics 1
## 379 meditate 1
## 380 meditation 1
## 381 mellow 1
## 382 message 1
## 383 messages 1
## 384 might 1
## 385 mildly 1
## 386 mind 1
## 387 minded 1
## 388 minimum 1
## 389 minute 1
## 390 missed 1
## 391 misunderstood 1
## 392 momentbr 1
## 393 moments 1
## 394 money 1
## 395 months 1
## 396 monthsbr 1
## 397 morbi 1
## 398 mostly 1
## 399 movie 1
## 400 movies 1
## 401 mushroom 1
## 402 musical 1
## 403 mutually 1
## 404 name 1
## 405 napoleon 1
## 406 nature 1
## 407 nearly 1
## 408 need 1
## 409 news 1
## 410 nicaragua 1
## 411 nisi 1
## 412 non 1
## 413 nonprofit 1
## 414 north 1
## 415 odio 1
## 416 officially 1
## 417 often 1
## 418 okc 1
## 419 okcupid 1
## 420 old 1
## 421 ones 1
## 422 oneself 1
## 423 optimism 1
## 424 originally 1
## 425 ornare 1
## 426 others 1
## 427 otherwise 1
## 428 outcome 1
## 429 packing 1
## 430 padding 1
## 431 palo 1
## 432 paragraph 1
## 433 part 1
## 434 parts 1
## 435 path 1
## 436 peeps 1
## 437 peninsula 1
## 438 perceive 1
## 439 perfectly 1
## 440 person 1
## 441 perspective 1
## 442 photography 1
## 443 pictures 1
## 444 place 1
## 445 play 1
## 446 please 1
## 447 pleasurablestrongbr 1
## 448 postal 1
## 449 prefer 1
## 450 process 1
## 451 programs 1
## 452 projects 1
## 453 promise 1
## 454 pulvinar 1
## 455 puns 1
## 456 quam 1
## 457 quarter 1
## 458 quickly 1
## 459 quite 1
## 460 quotes 1
## 461 rather 1
## 462 reading 1
## 463 reason 1
## 464 rebel 1
## 465 recent 1
## 466 refuse 1
## 467 regardless 1
## 468 relationship 1
## 469 repairteach 1
## 470 replying 1
## 471 research 1
## 472 respond 1
## 473 ride 1
## 474 risk 1
## 475 rock 1
## 476 rolling 1
## 477 romantic 1
## 478 root 1
## 479 rough 1
## 480 roundworld 1
## 481 rustling 1
## 482 sailing 1
## 483 salt 1
## 484 sarcastic 1
## 485 savvy 1
## 486 saybr 1
## 487 scelerisque 1
## 488 science 1
## 489 search 1
## 490 seattle 1
## 491 section 1
## 492 self 1
## 493 selfabsorbed 1
## 494 selfdriven 1
## 495 sense 1
## 496 shit 1
## 497 shopsbr 1
## 498 silencebr 1
## 499 single 1
## 500 sitebr 1
## 501 sitting 1
## 502 situations 1
## 503 sketch 1
## 504 slowly 1
## 505 smart 1
## 506 social 1
## 507 software 1
## 508 solely 1
## 509 somebody 1
## 510 something 1
## 511 sometimes 1
## 512 sorry 1
## 513 sound 1
## 514 special 1
## 515 spent 1
## 516 spots 1
## 517 stanford 1
## 518 startup 1
## 519 stay 1
## 520 stimulating 1
## 521 strong 1
## 522 strongstrong 1
## 523 strongunexpected 1
## 524 success 1
## 525 summer 1
## 526 swear 1
## 527 sweet 1
## 528 sweeter 1
## 529 takes 1
## 530 taking 1
## 531 tap 1
## 532 teaching 1
## 533 teleprompter 1
## 534 tellus 1
## 535 tempor 1
## 536 temporarily 1
## 537 tend 1
## 538 tennis 1
## 539 terms 1
## 540 thats 1
## 541 thoughtful 1
## 542 three 1
## 543 tiny 1
## 544 title 1
## 545 top 1
## 546 totaling 1
## 547 trade 1
## 548 transcendence 1
## 549 travel 1
## 550 treasuring 1
## 551 tries 1
## 552 trip 1
## 553 trips 1
## 554 tristique 1
## 555 trust 1
## 556 turpis 1
## 557 update 1
## 558 value 1
## 559 vapid 1
## 560 varius 1
## 561 views 1
## 562 villain 1
## 563 visiting 1
## 564 vitae 1
## 565 vivamus 1
## 566 volunteering 1
## 567 volutpat 1
## 568 wait 1
## 569 waiting 1
## 570 wake 1
## 571 wanted 1
## 572 wants 1
## 573 watching 1
## 574 way 1
## 575 webs 1
## 576 well 1
## 577 wellbr 1
## 578 whatever 1
## 579 whoever 1
## 580 wide 1
## 581 wish 1
## 582 woman 1
## 583 women 1
## 584 womenbr 1
## 585 wonder 1
## 586 workbr 1
## 587 worked 1
## 588 workerauto 1
## 589 worry 1
## 590 worst 1
## 591 write 1
## 592 yet 1
## 593 yoga 1
## 594 youth 1
## 595 zanzibar 1
ggplot(sfq_men[1:20,], mapping = aes(x = reorder(words, freqs), y = freqs)) +
geom_bar(stat= "identity", fill="#659ad5") +
coord_flip() +
scale_colour_hue() +
labs(x= "Words", title = "20 Most Frequent Words (Essay0 Subset for Men)") +
theme(panel.background = element_blank(), axis.ticks.x = element_blank(),axis.ticks.y = element_blank())
library(wordcloud)
wordcloud(sfq_men$words,sfq_men$freqs, min.freq = 1, max.words = 30, colors="#659ad5")
According to the plots and word cloud created for men the most frequent word used is ‘life’. The word love isn’t even ranked in the frequent words plot. We see more words like ‘new’, ‘things’, ‘looking’, and like. I believe that this means that the men on OkCupid are not looking for a serious commitment.
In conclusion, the analysis showed that the sentiments of essay0 were the same for both men and women. This was surprising to me since I expected women to sound more positive in their essays than men. The sentiment analysis did not give me results I expected so I then continued to analyze the frequency of words used in their essay. I found that women tend to use the word ‘love’ more while men did not use the word at all in the sample. I think this shows that women on OkCupid are looking for more of a commitment since they also use the word ‘family’ and ‘years’ while men use words like ‘new’ and ‘things’. However, this might not be enough to conclude commitment since I am basing this off of what I believe commitment sounds like. Women may type more words in general or have a different underlying variable that determines why women say the word love more than men. This could be an important question that requires further analysis.